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Abstract 

Background: Escherichia coli are a frequent cause of urinary tract infections (UTI) and are thought to have a 
food borne origin. E. coli with sequence type 1 27 (STl 27) are emerging pathogens increasingly innplicated as a cause of 
urinary tract infections (UTI) globally. A STl 27 isolate (2009-46) resistant to annpicillin and trinnethoprim was recovered 
from the urine of a 56 year old patient with a UTI from a hospital in Sydney, Australia and was characterised here. 

Results: We sequenced the genome of Escherichia coli 2009-46 using the lllumina Nextera XT and MiSeq 
technologies. Assembly of the sequence data reconstructed a 5.14 Mbp genome in 89 scaffolds with an N50 of 
161 kbp. The genome has extensive similarity to other sequenced uropathogenic E. coli genomes, but also has several 
genes that are potentially related to virulence and pathogenicity that are not present in the reference E. coli strain. 

Conclusion: £ coli 2009-46 is a multiple antibiotic resistant, phylogroup B2 isolate recovered from a patient with a 
UTI. This is the first description of a drug resistant £co// STl 27 in Australia. 
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Background 

Escherichia coli infections of the urinary tract are among 
the most frequent infections reported in the developed 
world with an estimated 130-175 million cases per annum 
worldwide [1]. E, coli that cause urinary tract infections 
(UTI) are classified as uropathogenic Escherichia coli 
(UPEC), a subgroup of extraintestinal pathogenic E. coli 
(ExPEC). ExPEC also cause a range of afflictions including 
meningitis, septicaemia, and pneumonia and are genotyp- 
ically and phenotypically distinct from diarrhoegenic E, 
coli (DEC) [2]. ExPEC are thought to be acquired orally 
via the consumption of contaminated food and are con- 
sidered to be zoonotic pathogens [3-5]. The emergence of 
multiple antibiotic resistance among ExPEC poses a seri- 
ous health threat; antibiotics are an important treatment 
strategy for controlling UTI. 

Multilocus sequence typing (MLST) is currently the 
gold standard for characterising E, coli causing UTI. 
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No clear diagnostic markers are available for identi- 
fying E, coli causing UTI, but several sequence types 
(ST) including ST131, ST405, ST95, ST65, ST127, and 
STIO are recognised UTI pathogens [6]. ExPEC ST127 
are described as community-acquired and highly viru- 
lent zoonotic pathogens [3,6] but to our knowledge there 
are no genome sequences representing antibiotic resis- 
tant isolates of this emerging pathogen. Studies of E, coli 
causing UTI in Australia have focussed on characteris- 
ing ST131 [7,8] and serogroup 075 isolates belonging to 
clonal complex 14 [9]. 

Here we describe the genome sequence oiE. coli ST127 
isolate 2009-46, a mid-stream urinary tract isolate from 
a 56 year old patient from the Sydney Adventist Hospital 
(SAN clinic) resistant to ampicillin and trimethoprim. 

Methods 

The isolate was supplied on a Sensi-agar plate from the 
SAN laboratories in Sydney, Australia. To confirm pure 
culture, a loopful of the isolate was streaked onto a Luria 
Bertani (LB) Agar plate and incubated at 37°C for long 
term storage in minus 80°C as a glycerol stock. A sin- 
gle colony was picked from the plate and subcultured in 
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10 mL LB broth at 37°C overnight. To prepare the glycerol 
stock culture 7 mL of the overnight was used, and genomic 
DNA was prepared from the remaining 3 mL. Genomic 
DNA for sequencing was prepared using the ISOLATE II 
gDNA extraction kit from Bioline. 

Genome sequencing 

DNA was quantified using qubit flourimetry and 0.5 ng of 
gDNA was used as template to construct the sequencing 
library, using the Illumina Nextera XT library prepara- 
tion protocol following the manufacturers instructions. 
However, the "PGR Glean-Up" and "Library Normaliza- 
tion" steps were omitted and size selection was instead 
performed by running balanced and pooled samples in a 
1% agarose gel and excising the 600 bp to 1200 bp region 
of interest. The DNA was then purified from the agarose 
using Promegas Wizard SV Gel and PGR Glean-Up Sys- 
tem. Finally, an Agilent 2100 Bioanalyzer, with a High 
Sensitivity DNA Kit, was used to quantitate the pooled 
DNA library before loading onto the MiSeq with other 
multiplexed samples. Two MiSeq runs were carried out, 
one with paired-end 250 nt reads on MiSeq V2 chem- 
istry and another with paired-end 300 nt reads on V3 
chemistry. The first library was found to have an average 
insert size of 368 +/— 157 nt, while the second library had 
inserts with an average size 497 +/— 118 nt. 

Assembly and annotation 

The genome was assembled using the A5-miseq pipeline, 
a version of the A5 pipeline [10] that has been revised 
to process reads up to 500 nt long. Briefly, the A5-miseq 
pipeline consists of five stages: (1) read quality filtering 



and error correction, (2) contig assembly, (3) permissive 
draft scaffolding, (4) misassembly detection, and (5) con- 
servative scaffolding. The revised A5 pipeline uses a new 
version of idba_ud that uses read pairing information, 
and that has been modified to accept reads up to 500 nt 
long and to construct de Bruijn graphs with /c-mers up to 
500 nt. These modifications provide substantial improve- 
ments in assembly contiguity. 

The genome was annotated with the RAST annotation 
system using FigFAM release 70 [11]. Putative antibiotic 
resistance genes and other genes of interest identified by 
RAST annotation were manually curated using the NGBI 
ORF finder and iterative BLASTn and BLASTp searches. 

Quality assurance 

The A5 pipeline includes a quality checking step that 
detects putative misassemblies by identifying clusters of 
read pairs that map to disjoint locations in the assem- 
bled genome. This method did not detect any putative 
misassemblies. 

Initial findings 

Sequencing generated 1,702,236 read pairs for a total of 
483,658,987 nt that were assembled to reconstruct the 
5,139,229 bp genome of £. coli 2009-46 in 89 scaffolds, 
with a scaffold N50 of 161 kbp and an N90 of 30.8 kbp. The 
raw (unfiltered) coverage is 94x, and after read filtering 
the assembly has a median depth of coverage of 61x. The 
annotation of this assembly identified 5084 predicted GDS 
and 106 predicted RNA genes. 19 genes were identified 
as possibly missing from the assembly by the RAST sys- 
tem. The overall functional profile of the genome is shown 
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Subsystem Feature Counts 

Cofactors, Vitamins, Prosthetic Groups, Pigments (295) 

Cell Wall and Capsule (275) 

Virulence, Disease and Defense (120) 

Potassium metabolism (31) 

Photosynthesis (0) 

Miscellaneous (52) 

Phages, Prophages, Transposable elements, Plasm ids (108) 

Membrane Transport (248) 

Iron acquisition and metabolism (57) 

RNA Metabolism (247) 

Nucleosides and Nucleotides (163) 

Protein Metabolism (300) 

Cell Division and Cell Cycle (38) 

Motility and Chemotaxis (83) 

Regulation and Cell signaling (163) 

Secondary Metabolism (5) 

DNA Metabolism (143) 

Regulons (12) 

Fatty Acids, Lipids, and Isoprenoids (142) 

Nitrogen Metabolism (72) 

Dormancy and Sporulation (7) 

Respiration (181) 

Stress Response (188) 

Metabolism of Aromatic Compounds (3) 

Amino Acids and Derivatives (410) 

Sulfur Metabolism (58) 

Phosphorus Metabolism (51) 

Carbohydrates (829) 



Figure 1 Subsystems in E. coli 2009-46. 
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in Figure 1. We conducted a phylogenetic analysis of E. 
coli 2009-46 using the PhyloSift software [12] to identify 
the most closely related organism with an available refer- 
ence genome. PhyloSift works by identifying homologs in 
the draft genome to a set of 37 genes that are universally 
conserved among bacteria and archaea and present in sin- 
gle copy. It then adds any homologs found in the draft 
genome to an existing multiple sequence alignment con- 
taining the 37 genes from a subset of all genomes publicly 
available in the NCBI and EBI databases that is chosen 
to span the phylogenetic diversity of these databases. The 
PhyloSift reference database includes only a single rep- 
resentative from groups of closely related organisms. To 
gain additional resolution in the Escherichia, we used Phy- 
loSift to construct a multiple alignment of the 37 marker 
genes from all finished E. coli genomes available in the 
NCBI database as of September 2013. We then inferred a 
phylogeny from that alignment using FastTree2 [13]. The 
resulting analysis, shown in Figure 2, identified E. coli 536 



as the most closely related isolate with a finished genome 
available, although there was some uncertainty in the 37 
gene alignment as to whether E. coli 2009-46 diverged 
on the same lineage as E, coli 536. We used the closely 
related genome of E, coli 536 as a reference for further 
comparative analysis. 

The scaffolds of £. coli 2009-46 were reordered to match 
the order in the finished genome of the closely related 
strain E, coli 536 using the Mauve Contig Mover [14]. 
After reordering, the genomes had 82 predicted rear- 
rangement breakpoints. Many of these cluster in regions 
containing annotated transposase genes and multi-copy 
transporter gene families, suggesting either homology- 
mediated rearrangement or misassembly has occurred 
at these repetitive sequences. To further characterize 
the structure of the genome we used the CGview web- 
server [15] to plot matches to annotated proteins and the 
GC skew of the genome, with scaffolds ordered accord- 
ing to the E, coli 536 reference. The CGview plot is shown 
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Figure 2 Phylogeny of E co// and Shigella including the 2009-46 isolate. A phylogeny inferred on a concatenated set of codon alignments 
from 37 universally conserved genes is shown, as calculated by PhyloSift [1 2] and FastTree2 [1 3]. The phylogeny has been rooted on the branch 
leading to Solmonello and internal nodes are labeled with SH-like support values. 
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in Figure 3. Of note, the GC skew in E. coli 2009-46 
genome appears to fluctuate frequently. This pattern is in 
sharp contrast to the GC skew of the E. coli 536 reference, 
which shows a strong pattern coinciding with the chro- 
mosome s replication arms (data not shown). This sug- 
gests that either E, coli 2009-46 has undergone substantial 
genome rearrangement in the recent past, that the true 
genome arrangement may not match the E, coli 536 ref- 
erence very closely, that undetectable misassembly errors 
exist in the E. coli 2009-46 genome, or that some combi- 
nation of these three situations exists. We note that our 
assembly pipeline contains a step to detect and fix misas- 
sembly errors; none were found in the genome of E. coli 
2009-46. 

Comparison of the gene content between E, coli 2009- 
46 and the finished E, coli 536 reference genome identified 
164 annotated gene functions predicted to be present only 
in E. coli 2009-46. Included among these are several genes 
related to scavenging iron, a type VII secretion system, 
an IncF conjugation system, mediators of hyperadherence, 
and copper and mercury resistance genes. The full list of 
gene functions found only in 2009-46 and those which 



2009-46 lacks relative to the reference isolate are listed in 
Additional files 1 and 2, respectively. 

The blajEMi gene, conferring resistance to ampicillin, 
was present on scaffold 78.1 (2551 nt), while the sul2- 
strA-strB genes conferring resistance to sulphonamides 
and streptomycin was located on scaffold 67.1, which was 
5064 nt long. Ends of both the scaffolds had a partial 
copy the insertion element IS26. The isolate also houses 
a clinical class 1 integron and two associated resistance 
genes on scaffold 71.1. One of the two resistance genes 
is a variant of dihydrofolate reductase {dhfr) gene which 
provides trimethoprim resistance to isolates and the other 
confers resistance to aminoglocoside antibiotics {aadA), 
However the scaffold, 71.1, is 3,863 nt long and also has a 
copy of IS26 at both ends. We identified the presence of 
the 3'-CS of a class 1 integron on scaffold 58.1 (6679 nt 
long), that had an IS26 on one end and an ISi element 
on the other. Presence of IS26 elements at both ends of 
seven scaffolds has resulted in scaffold breaks around a 
region of the genome, which most likely harbours a com- 
plex resistance locus (CRL), during the assembly of the 
genome sequence. We were therefore unable to confirm 




Figure 3 CGview plot of the E co// 2009-46 genome. The two outermost circles in tine figure contain a series of arrows in opposite directions 
representing predicted ORFs (greater tlian 1 00 codons) on tine two strands of DNA sequence. Tine solid line, forming the third ring (from outside) 
indicates BLASTn analysis (set with a cutoff of le- 10) of the isolate against the £ coli 536 genome. Relative GC content along the length of the 
genome is plotted as a graph in the black circle. The GC content clearly indicates multiple regions of GC content variation along the genome, 
possibly indicating lateral gene transfer events. 
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the exact genomic location of the CRL or resistance 
genes. 

Antibiotic resistance profile 

The antibiotic resistance profile of E. coli 2009-46 
was experimentally determined using the disk diffusion 
method. This strain was found to be resistant to Ampi- 
cillin, Trimethoprim, Sulphafurazole, Tetracycline, Strep- 
tomycin, Apramycin, Kanamycin, and Azithromycin. A 
full list of antibiotics tested and E, coli 2009-46 suscepti- 
bility is provided in Additional file 3. 

To better understand the genomic basis for the observed 
antibiotic resistance traits, the genome was searched for 
specific genes known to confer antibiotic resistance. A list- 
ing of these genes and their presence or absence in E. coli 
2009-46 is provided in Additional file 3. 

Future directions 

Improved efficiency of clinical genomics pipelines will 
eventually enable fine-scale epidemiological monitoring 
of E. coli outbreaks in real time. When fully developed, 
this capacity will influence clinical and public health deci- 
sions related to treatment and control of pathogen out- 
breaks. Genomic data such as is presented here will aid in 
the interpretation of data from future outbreaks. 

Availability of supporting data 

The draft genome assembly has been submitted to 
NCBI and is associated with BioSample accession 
SAMN02725027. Genome annotations are available from 
the RAST web server under accession 562.3620. The 
Illumina sequence reads have been deposited to the 
Short Read Archive under accessions SRX5 14806 and 
SRX5 14807. 

CDS: Coding DNA sequences; ORF: Open Reading 
frame; RAST: Rapid annotation using subsystem technol- 
ogy; A5: Andrew and Aarons Awesome Assembly; gDNA: 
genomic DNA; nt: Nucleotides. 

Additional files 



Additional file 1 : Gene functions (as identified by RAST subsystems) 
found to be present in the newly sequenced E. coli 2009-46 isolate 
but not the E. coli 546 reference genome. 

Additional file 2: Gene functions (as identified by RAST subsystems) 
found to be present only in the E. coli 546 reference genome in a 
pairwise comparison with E. coli 2009-46. 

Additional file 3: Details on PGR cartography, virulence gene 
searches, and antibiotic resistance assays. 
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